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SUMMARY 

Problem 

It is important that naval training keep up with changing requirements. 
As new weapon systems are introduced, curricula mxst be revised, new 
nanu&ls prepared, and instructors trained. Previously trained personnel 
often are expected to keep up with new developments by independent study 
of technical manuals, without formal training. Sophisticated instruc- 
tional technology would be desirable but is seldom available. Programmed 
instruction and computer-assisted instruction require long lead times 
to develop. Some method is needed to assist personnel on the job in 
their independent study and review of technical materials. 

Obi ective 

The objective of the project was to improve the independent- study of 
technical manuals and books by developing a kind of automatic CAI system 
called automatic question generation (AUTOQUEST) that does not require 
human frame-preparation. The system takes ordinary text and presents 
it to the student on a CRT device, a paragraph at a time. After the 
student studies the paragraph, he signals the computer that he is ready 
to be tested on it. The computer then generates a question to the 
student based on one of the sentences of the text. The student s an- 
swer is evaluated and, if incorrect, thp paragraph is shown to the 
student again. 

Approach 

The approach to autonjatic question generation was entirely syntactic 
.^ther than semantic. That is, only the form of a sentence, not its 
•meaning, influences the question generation. This method was chosen so 
that the system would have enough generality to work on any body of 
English text, regardless of subject matter. The program compares a 
selected text sentence against a table of pre-stored patterns. If 
the sentence fits a pattern, a certain kind of question is generated 
from it. If the sentence does not fit any pattern in the table, it 
is ignored. 

Findings 

About 68% of the generated questions were satisfactory, while the 
remaining ones were ungrammatical or inappropriate. Most of the errors 
were du« to the Inability of the pattern-matcher to utilize all of the 
syntactic information in the sentence. 

Recommendat ions 

It is suggested that farther work with automatic question generation 
be continued when a satisfactory computer-based English parser with 
good error recovery becomes Available to preprocess the text sentences. 
Improved parsers are under development by other research workers and 
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should be available in 1 or 2 years. The e'conomic feasibility of this 
approach is discussed and is projected to be tuider one dollar per stu- 
dent-hour within 5 years when individual stand-alone minicomputers 
become available which are capable of running programs written in the 
LISP prograimning language. 
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INTRODUCTION 



Problem 

It is Important that naval training keep up with changing requirements. 
As new weapon systems are introduced, curricula must be revised, new man- 
uals prepared, and instructors trained. Previously trained personnel often 
are expected to keep up with new developments by independent study of tech- 
nical manuals, without formal training. Sophisticated instructional tech- 
nology would be desirable but is seldom available. Programmed instruction 
and computer-assisted instruction require long lead times to develop. 
Some method is needed to assist personnel on the job in their independent 
study and review of technical materials. 

Purpose 

This report describes an experimental computer-based educational 
system called automatic question generation (AUTOQUEST) for assisting in- 
dependent study of written text. Studies of reading comprehension have 
shown that retention of material is enhanced if the student is periodically 
required to answer questions about what he has read (Anderson & Biddle, 
1975; Anderson et al., 1974; Alessi et al., 1974; Anderson et al., 1975a, 
1975b). This principle has been employed in computer-managed instruction, 
but it requires considerable human effort to prepare the questions. 

The goal of AUTOQUEST is to automatically generate questions from text 
In order to improve independent study of any textual material. The AUTO- 
QUEST system presents text on a computer terminal to a student, a paragraph 
at a time, and asks him questions about it, based on a randomly selected 
sentence contained in the paragraph. If the student's answer contains 
a certain percentage of the words in the original sentence, the student 
is told his answer is correct and he goes on to the next paragraph. If 
the answer is judged wrong, the paragraph is displayed again and another 
question is generated. 

The research was almost entirely directed toward the development of 
the techniques for generating questions. Issues of economy or efficiency 
of programs did not receive much attention. Nor was the project concerned 
with running subjects to determine the pedagogical effectiveness of the 
system when compared with unsupervised study or conventional CAI. The 
project wa« devoted toward determining the feasibility of automatically 
producing a man-machine dialogue by natural language processing techniques 
with minimal preprocessing of text by human beings. 

Economics and Computer Technology 

Since one of the motivations of this research is to save the cost 
of human-generated questions, it is appropriate to discuss the economic 
feasibility of AUTOQUEST. At present, it requires about 6 minutes of CPU 
time on a Digital Equipment PDP-10 per student-hour of instruction. This 
works out to about $40 per student-hour, not including telephone charges 
and computer terminal costs. The prospects for reducing the cost are 
discussed below^ 
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The future of CAI seems to be in providing each student with small 
Individual computers. One such system, which has been announced within 
the last 6 months, consists of a small computer with 32,796 bytes of core 
memory, a flexible disk drive containing 1/4 million bytes, a CRT terminal, 
and a hardcopy printer, all at a purchase price of $7900. Costs per stu- 
dent-hour are estimated to be under one dollar. AUTOQUEST is programmed 
in the computer language LISP, which is not yet available on such computers, 
but work is now underway to develop a microcoded version of LISP for small 
computers. Thus, we envision AUTOQUEST running on a small computer within 
3 years. The initial price of this computer will be about $50,000, but 
will eventually decline to under $10,000 (Horn and Winston, 1975). 

Video disk technology will have a significant impact on the economics 
of CAI. Within a year, video disk players for the home entertainment 
market will be available at prices near $600 C^alker, 1975). Each video 
disk will hold from 30 to 60 minutes of television, containing as much 
as 100,000 still frames. If video disks could be adapted to digital text 
storage, several hundred books could be randomly accessed within a single 
disk costing less than $10. If this kind of disk player were connected 
to a computer, the student would have an information retrieval system 
capable of rapidly finding information contained in his library. An auto- 
mated tutorial program like AUTOQUEST could be added at very low marginal 
cost. 

Background 

The field of natural language processing by computers is too large 
to be reviewed fully here. An epccellent recent review by Walker is avail- 
able (Walker, 1973). However, I would like to describe briefly the papers 
which have particularly influenced the present study. After several years 
of relative inactivity, natural language processing took quite an upsurge 
with the publication of Woods' Lunaft Information System (Woods et al., 

1972) and Winograd's SHRDLU robot (Winograd, 1972). Woods' system uses 
a transition network, in which a link is associated with an attempt to 
find a particular type of phrase or word in order to transform a sentence 
(usually a question) into its "deep structure" set of simple active sen- 
tences. A semantic interpreter processes the deep structure and produces 
an information retrieval program for answering the question. The generated 
program searches a data base (originally having to do with lunar rocks) 

and answers are returned to the questioner. Winograd 's system is charac- 
terized by a sophisticated interaction between a systemic grammatical parser 
and a procedural semantic interpreter. The system allows a user to inter- 
rogate the computer about a world of colored blocks and pyramids and to 
program the computer to carry out instructions to move blocks into specified 
places. Of the two systems, Winograd 's is generally conceded to be more 
powerful, while Woods' is thought to have greater generality Both systems 
have Inspired a number of successors. 

There has been a resurgence of Interest in machine translation (Wilks, 

1973) and case-frame representations of meaning (Schank and Colby, 1973). 
The case-frame approach was used by Slmtaons and Smith (1974) in a project 
closely related to the present one in its purpose. The investigators 
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showed how questions could be generated from text once the text had been 
converted to a deep case-frame representation. Unfortunately, they found 
that no existing or projected computer program could produce the case 
representations from unrestricted text, so that human preprocessing of 
text was required for their question-generator to work. Hopefully, a 
computer program will someday be developed to replace the human editing 
presently required by this system. 

One of the first approaches to natural language processing was Weizen- 
baum's Eliza program (Weizenbaum, 1966), in which the computer simulated 
a nondirective psychotherapist. The program uses pattern matching keyed 
to certain words in the patient's conversation with no real understanding 
of the content. For example, if the patient said, "I am very unhappy these 
days," the computer would notice the words "I am" and generate "How long 
have you...," followed by the remainder of the patient's statement so as 
to produce the question, '^low long have you been very unhappy these days?' 
A more sophisticated pattern-matching system to simulated paranoia was 
developed by the psychiatrist Colby (Colby, Parkinson, and Frought, 197A). 
For reasons to be described later in this report, the pattern-matching 
approach is the one used by the AUTOQUEST system. 
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APPROACH 



The approach to automatic question generation was entirely syntactic 
rather than semantic. That is, only the form of a sentence is treated, 
not its meaning. In this manner, a general system could be developed which 
wuld work on any body of English text, regardless of subject matter. 
If a semantic approach had been employed, the project would have been 
restricted to one or two specialized subjects and a great amount of effort 
<rould have been spent in developing semantic models for those subjects. 

A pure syntactic approach has many limitations, of course. First, 
the student is required to give verbatim parts of the original text in his 
answers. Second, it is well known that many English sentences are syntac- 
tically ambiguous and can only be parsed correctly when their meaning is 
taken into account. It was our hope (justified later by our results) that 
these problems would not occur so frequently that a useful system would 
be Infeasible. 

The project's first attempt at syntactic question generation was a 
two-stage process in which a text sentence was put through a parser and 
the parsed sentence reassembled into the form of a question. For the 
parsing, we used a version of Woods* Augmented Transition Network Parser 
which wi^s available on a computer at the University of California (OC) 
at Irvine. It was a slightly scaled-down version of his experimental Lunar 
Sciences Natural Language system with the so-called NASA grammar. The system 
is one of the most successful natural-language programs ever developed. 
It proved to be extremely useful in helping geologists retrieve information 
about lunar rocks. Unfortunately, a number of problems were encountered 
in adapting it to this work. The parsing program occupied so much computer 
memory that it could only be run after midnight at UC Irvine. Further, 
50% of the text sentences failed to parse and gave error messages indicating 
they needed more memory. Of the remaining 50% parsed sentences, only 60% 
appeared to have been parsed correctly. As a check, one of the sentences 
that failed at UC Irvine was run at Bolt, Beranek, and Newman, Inc. (BBN) 
with the original version of the parser. The program ground away for 
over an hour before returning a message that space requirements had been 
exceeded. 

This experience was sufficiently discouraging that it was decided not 
to try to adapt any existing general-purpose parser to this project* 
Instead, the decision was made to develop a specialized pattern-matching 
program that compares a sentence against a table of pre-stored patterns* 
If the sentence fits a pattern, a certain kind of question is generated 
from It* If the sentence does not fit any pattern in the table, it is 
ignored • 

The following sentence, taken from a computer programming manual, helps 
to illustrate the process: "The dd name identifies a DD statement so that 
subsequent control statements and the data control block in the processing 
program can refer to It." The sentence matches a pre-stored pattern in 
the program of the form: "Si so that S2." 



The first part of the sentence, S^, is scanned to locate the verb. 
If the first verb found is an auxiliary, such as "is," "was," "were," 
"do," etc., it is moved to the front of S^* Otherwise, the tense and 
number of the verb are examined and an appropriate auxiliary created at 
the front of Sj. The transformed Sx is called QFORM<Sx). The generated 
question is; "Why QFORM CSi)?" The computer also generates an expected 
answer: "So that Applying these rules to the text example, ^e get 

the question, "Why does the dd name identify a DD statement?" with the 
expected answer, "So that subsequent control statements and the data con- 
trol block in the processing program can refer to it." 

1 The student's answer is checked to see if more than 50% of the long 
Vords (more than 4 letters) of the expected answer are contained in his 
answer. In this example, he has to come up with at least four of the 
words in the list (subsequent control stateioents block processing program; 
. ±0 order to have his answer judged correct. The' 50% criterion allows 
the student some fle:;ibility beyond a complete verbatim requirement. 
The restriction tro long words was designed to eliminate common English 
words and increase the percentage of contentrspecific words. 

If the expected answer contains a conjimction, such as '-'and" or "of," 
then the student's answer will be judged partly right if it is correct 
for part of th^e\answer on either side of thfe conjunction. For example^, 
if the student replied, "So that subsequent control statements can refer 
to it," the computer would com? back saying, '^es; subsequent control 
statements and the data control block in the processing program can refer 
cto it. Your answer is partly right." ^ \ 



DESCRIPTION OF METHODS 

The Question Generation Algorithms 

c 

Vocabulary , The'l^ogram uses a dictionary of articles, prounouns, 
prepositions, conjunct i<5ns, and about 16,000 verb stems. A morphological 
analysis subroutine strips off the endings of a word such as -ing, -s, -ed 
-es and tests if the remainder of the word is in the verb list. The 
verbs are classified as transitive or intransitive^ regular or irregular, 
and verb, verb-i:ioun, verb-adjective, or verb-adjective-noun. .For example, 
the word "control" is^ a verb-moun since it can be used either as a verb 
or as a. noun; it is also transitive and regular. The word "bring" is 
always a transitive irregular verb. 

A list of common adverbs also appears in the vocabulary. la addition, 
words ending in -ly which had not been assigned another part of speech 
vere classified as adverbs. 

Since nouns can modify nouns, as in the phrase, "job control language, 
no distinction' was made between adjectives and nouns in the lexicon. 
Any words wjiich were not In the vocabulary were automatically classified 
as nouns whenever they appeared in a sentence. 

Pattern-matching (QGENR) . Given a sentence in the form of a list, 
QGENR tries to generate a question from it and an expected answer. QGENR 
consists of three parts: (1) a preprocessor, (2) a pattern-matcher, 
and (3) a post-processing filter. 

Preprocessing Stage . If the first word of the sentence says 
"Goodbye," "Quit," etc., then return, "I have enjoyed working with you. 
Goodbye." ^ 

If the sentence contains a colon, reject it. (Sentences with 
colons were often found to contain subsentences or complex phrases which 
the pattern matchers could not easily han41e.) 

If the sentence has loore than 35 words, reject it. The figure 
"35" was arbitrarily chosen in order to eliMnate the most complex and 
unwieldly sentences while permitting a fair number of sentences to remain 
for further processing. 

If there are less than four words preceding a comma, which in- 
clude a preposition^ or filler word like "nevertheless," "however," etc., 
strip them off the beginning of the sentence. 

If the sentence begins with a comma or filler word, strip it off. 

Mark all the words endirig in -ly as adverbs. 

/ 

Apply the function SETN^OUN to every ward in the sentence. This 
' function first tries to see if the word is already in the vocabulary or 



is a verb form of a, known verb. If so, it is marked as such; otherwise, 
it is marked as a noun* 



If the first word in the sentence is a verb, the sentence is 
rejected. 

The sentence is scanned for two verbs with the same tense and 
person connected by the word "and." If found, all the words are deleted 
starting with the^first verb up to and including the "and." 

The word preceding the verb is' marked as a noun. Then the. sub- 
routine CHECKVERBS is executed to mark excess verb forms as nouns. 

Pattern-matching Question Formation Stage * A list of patterns 
was developed by generalizing from the sentence features which I found 
myself using when I generated questions from text. Each pattern seemed 
to be associated with a particular rule for generating a question. The 
list of patterns is not claimed to be complete, optimal, or always correct. 

The program checks the patterns given in Table 1 in sequence until 
a pattern is found which matches the sentence. Then the corresponding 
questions and answers are generated* Although the table does not show 
adverbs, they may optionally appear next to verbs in the sentence patterns. 
The program first checks to see if any subordinate conjunctions occur* 
If they do, only the first set of patterns in the table is handled. The ^ 
question is generated from the main part of the sentence and the subordin- 
ate clause is appended to the end of the question. 
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Some comments on the patterns are in order, fattem D really only 
works where the verb in Y is subjunctive. Pattern B could probably be 
eliminated, since G also generates a sensible question for infinitive 
verb complements. F is an attempt to handle passive wlce. H, J, K, and 
M 2 are all questions about the subject of the sentencfd which ask for 
the adjectives, prepositional, phrases, or relative clauses which modify 
the principal noun in the sjilbject noixn phase. Pattern I asks for the 
prepositional phrase which, modifies the main verb (vrliich mast be passive 
or intransitive). ' 

Pattern L work?? only when th6 "that" clause follows the main verb 
of the sentence and i^ the object of it, as in "the investigation showed 
that..." Pattern I fsks for the object of the verb when the subject con- 
tains certain pronouns because to ask for the subject would be to invite 
a trivial answer. / * 

Post-processing Stage . In order to screen out questions and an- 
swers which are, likely to be of poor quality, several post-processing 
filters are applied to the generated question-answer pairs. The filters 
were somewhat .arbitrary rules designed to eliminate the most complex 
questions and answers. ' 

The program rejects the Q-A pair if any of the following conditions 
are true: 

1. Either the question or answer is longer than 17 words. 

2. The word "and" appears in the question. 

3. The first word of the answer is a relative pronoun or, subordin- 
ate conjunction. 

4. Certain pronouns appear in the qufestion. 

5. Certain pronouns appear in the answer and the answer has less 
than five words. 

6. There .is a comma in the question. 

7. An error <wa6 encountered previously in the .subject-verb in*- 
version routine QFOBM. 

Finding the Verbs in the Sentence (CHECKVERBS) . Possible verbs are 
separated from verbal forms used as nouns by the following process: 

1. If a "verb" is immediately preceded by a determiner, it is marked 
as a noun. 

2. If a "verb" immediately. precedes another verb and the first verb 
is not a copula, it is marked as a noun. 

/ 
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3. A verb following an auxiliary, or an auxiliary plus an adverb, 
is retained. 



A. A verb following the word "to" and possibly preceded by an adverb 
is retained. 

5. A verb following some form of "be" or "have" and possibly preceded 
by an adverb is retained as a verb. 

The number of remaining verbs is counted and compared with the number 
of relative^ and subordinate conjunctions in the sentence plus one. If 
the difference is zero, the program returns the sentence as a success. 
If it is less than zero, the sentence is rejected as hopeless. If it 
is greater than zero, the reiSaining verbs are scanned from right to left. 
Present tense verbs which could be adjectives or nouns are marked as nouns 
until the number of excess verbs Is reduced to zero if possible. If nec- 
essary, past tense forms are then marked as nouns until the number of excess 
vMbs Is reduced to zero. 

Subiect-verb Inversion (QFORM) . QFOEM performs subject^verb inversion 
according to the following rules: ^ 

\1. If there are any conjunctions in the sentence, take the QFOEM 
of the first part of the sentence and append the part with conjunctions. 

2\ If'ther^ is an auxiliary in the sentence, move It to the front 
of the sentence, and return. Look for a third person singular present 
^ tense verb, lleplace it by infinitive forms and put "doep" at the beginning 
of the sentence. and return. 

3. Look for a present tense verb and, if found, put the word "do" 
at the beginning of the sentence and return. 

4. Look for ^ past tense verb and. If found, 'put the word "did" at 
the beginning of the sentence and return. 

5. If none of thtf above work, set the global variable QFORMERROR, 
Which signals failure to the system. 

> \ ' - 

Paragraph RecoRnitlon . 

The-iriput-output rmlTlnes for AUTOQUEST turn out to be extremely com- 
plex and require nearly 50% of the processing time. The difficulty lies 
In recognizing exactly what a paragraph is. Usually the first word of 
a paragraph is indented, but sometimes it is not. If a paragraph is 
indented, .'this -fact must be distinguished from the indentions of every 
line which are associated with margins. Some paragraphs are identified 
by letters or numbers preceding the first word. On the other hand, we 
do not want to treat a, table of contents as if it were a paragraph. Some 
par^grapiia are recognizable by an extra blank line between paragraphs, 
but the recognition algorithm must allow for double- or multiple-line 

18 
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spaclngt Section headings or titles should not be treated as sentences* 
Copyright notices and similar paraphanella are not good candidates for 
instruction and need to be screened out automatically* 

Various heuristics have been devised to screen out noise from the 
text which should be processed. For example^ a table of contents can 
usually be recognized by embedded blanks or periods In the middle of the 
line* Titles are usually eliminated by screening out "sentences" of 
less than five words* [ 

The program stores each paragraph It reads In two forms: (1) as a 
list of sentences i^d (2) as a string of characters* Before generating , 
a question^ the program presents the paragraph to the student as a strings 
exactly as It was read. Then a sentence is randomly selected from the 
paragraph list* If it has already been used for a question, it may be 
rejected and a new sentence selected* The QGENR routine is executed to 
produce a question-answer pair* 



\ 
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RESin.TS 

. The AUTOqUEST pf ogfaih was tested oh a set of abstracts of technical 
reports from the Stanford University Artificial Intelligence Laboratory 
and on a page of text from an IBM programmer's manual. An example of 
a study session using AUTOQUEST appears in the appendix and illustrates 
both successful and unsuccessful interactions. 

, Of the 50 generated questions, 34 were judged to be satisfactory. 
The 16 unsatisfactory questions were classified into four groups: (1) 
eleven syntactic errors, (2) three semantic errors, (3) one pedagogical 
error, and (4) and one program bug. The syntactic and semantic errors 
are listed in full in Tables 2 and 3. The program bug was one in irtiich 
the value DET was replaced by NOUN on the property list for the word "the," 
causing the program to generate a question in which the answer was the 
single word "the." The one error classified as pedagogical was the ques- 
tion, "What do we describe?," which could have been answered by two dif- 
ferent sentences in the same paragraph. 

TABU 2 

Ex«apl«« of Synttcdc Errors vlch Autoquott 



1. SnrrCHCE: Thtra are tvo u«u«l ch«ract«rlsaCIon« of Cb* U«sc fl)(c<l*polnC of « conClnuous fimcclon. 

QUESTIOH: What ara two usual charactarlsatlons of th« laaat flxad-polnt of a cootlnuoua func tion? ^ 

2. SEtrrCNCE; W« praaent tvo dlffaront first ordar axloiasatlona cf tha aataiuthaiutlcs of the logic which FOL Itself 

chacks .and shov aevaral pzoot* uslns aach ona. 

QUESTION: What kind of proofs usinf aach onaT 



SEMTEHCC: "On AutOMtint tha Cona true t ion of Protrai 
63 patas» May 1974. 

QUESTION*: What Hay 19747 



SENTCKPE: Hathodc ara inducad to detact aoM typoa of unaatchabla tarsat areaa in tha oriflnal data and for 
datactint whan a auppoaed aatch in Invalid. 



QUESTION: What happcna with mathoda when a auppoaad natch la invalid? 



5. SENTEHCEi Input to tha aaaory haa tha fom of anaiysad concaptual dapandency ftrapha which roprencnt the underlying 

■caning of lanfuata uttarancea. 

f^KSTlOM: What input naanint of lanjiuaia uttarancra? ^ 

6. ^ .smiCKCK: Then St logka at thaae Unaa And ollMlnatea Una* which cannot lutch any uf thr *odc] Mncn. 

<iUK!;TJOH: Wliat cannot It alUlnataa linea which aatch? ^ 

7. SEKTEHCE: Tha aathod* a«bodiad in a coaputfar protra«» tanarataa « coaplata list of IsoMra. 

QOESTIOK: Whitt aabodlad tanarataa a covplata llat of la^aara? ^ 

8. SEOTEHCE: Tha profraa haa th« abUlty to craata^i^dastroy, and avan raaufract objacta In Ita world. 

QUESTIOH: WhAt doaa tha avan raaurratjt objact In? , '' 



9. SEKTENCX: This profraaa raport covara tb* flrat yaar «nd ona IjAlf of work by our autoMtlc protra-aing 
rataarch (roup at tha Stanford Artificial IqtaUlianca Uboratory. 



QUESTION 



: What work prograi^t raaaarch troup at tha Stanford ArtiflcUl Intallitanca Uboratory? 



10. 



SENTEHCI: Tha uaar rataina control at all ti«ta. 
QUESTION: Vhara doaa tha uaar rataina -control? 



\ U. 5£KI£IICt: COPILOT uaaa th« multlpla proeaMini faciUtlaa to ita tdvantafa to mchiava af kind of intaractlva 
V control which wa hava tarwcd < oon^praaap'klva >. / 

\ ' QUESTIOH; What doaa COf ItOT uaaa tha itultipla procaaa faellltiaa to if advantatc to 4o? 
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TABLE 3 

Examples of Semantic Errors with AUTOQUEST 



1. SENTENCE: The primary goal of comprehension is always to find 
meanings as soon as possible* 

QUESTION: How soon is the primary goal of comprehension always 
to find meanings? 



2. SENTENCE: Although FACT uses substantially more main memory than 

MACRO- 10, it assembles typical programs about five times 
faster. 

QUESTION: What does it assemble although FACT uses subistantially 
more main memory than MACRO-IO? 



3. SENTENCE: The program is reproduced: in full. 
QUESTION: What is the program reproduced in? 



The semantic errors seem to involve generating questions which imply 
a concrete answer where the source sentences involve abstract or idiomatic 
answers. Perhaps a simple addition of semantic markers (abstract-concrete, 
human-nonhuman, etc.) would prevent most such errors. 

Nearly 69^;of the errors were syntactic. By far the most common 
error was misidentif ication of the verb in the sentence. The source 
sentences are not themselves syatactically ambiguous. The problem seems 
to be that the pattern-matcher throws away most of the syntactic Informa- 
tion in the sentence which a complete parser would utilize. 

» 

A second type of syntactic error, illustrated by sentences 4 and 6 
!!!• Table 2, is to mistake the level of a subordinate clause. For example, 
the when clause in sentence 4 is assumed to inodify "included," but actually 
laodifies "detecting." The pattern-matching approach is insensitive to 
leyels of sentence embedding. 
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^ CONCLUSIONS AND RECOMMENDATIONS^ 

m I 

The pattern-matching system works best on relatively simple sentences. 
It assumes that the verb which it finds is the main verb of the sentence, 
not part of some embedded sentence. The relative clauses, prepositional 
phrases, infinitive verb complements, and subordinate clauses which it 
is asking for must modify constituents of the top level of the sentence 
and not embedded sentences. 

The system would work much better if a reliable parser could be found 
to serve as the front-end of the pattern matcher. It would not be necessary 
to parse the sentences down to the lowest level. All that is required 
is that the parser correctly identify the top level constituents of the 
sentence. A satisfactory parser for our purposes must have the following 
properties: ^ 

1. It must have complete error recovery capabilities. 

2. It must be able to identify the top level constituents of the 
sentence. 

3. It must be able to flag cases of multiple parsings if they occur 
at the top level. 

4. It must be able to determine the grammatical category of unknown 
words from their usage. 

The most important requirement is error recovery. It is not necessary 
for the parser to work all the time, provided we know when it is not 
working. 

Since questions need to be generated only once or twice per paragraph, 
a great many sentences can be rejected if the parser encounters an error 
or ambiguity in them. About the worst thing a parser can do is generate 
a system diagnostic which fo^rces a break. The next worst thing it can 
do is pretend it has not encountered an error. 

There is some reason to hope that a reliable parser meeting the above 
' specifications will be available in about 2 years, in which case some 
very simple modifications to the AUTOQUEST system will improve its perfor- 
mance. For example, Kaplan and Kay have been working on a general, syn- 
tactic processor which handles syntactic ambiguity In an economical format 
(Kaplan, 1973), and Burton (personal communication) has done some prelimin- 
ary work on a grammar compiler which runs the NASA grammar about ten times 
faster than Woods' original system and which has improved debugging facili- 
ties (Burton, 1975). 

For the near future, these parsers will have to operate with grammars 
that are capable of describing only 30 to 50% of English text sentences. 
Very little effort or funding is being expended on devetoping more cotnplete 
computerized grammats of English* j 
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The results with the present version of the system are sufficiently 
encouraging, however, so that a teat of the pedagogical effectiveness of 
AUTOQUEST could and should be carried out in the future. 
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APPENDIX 
Sample AUtpQUEST Dialogue 
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<gsp>quest 



AUTOQUEST - AN AID TO INDEPENDENT STUDY. 
TO BEGIN, TiPE (STUDY) 
2-JUN-75 ... 



Good evening. 
_( study) 



WELCOME TO AUTOQUEST-' AN AUTOMATIC STUDY AID. 

WOULD YOU LIKE SOME INSTRUCTIONS ON HOW TO USE THE SYSTEM? yes 
AUTOQUEST 

AUTOMATIC QUESTION GENERATION AND RESPONSE EVALUATION 

AUTOQUEST IS DESIGNED AS AN AID TO INDEPENDENT S^DY. IT WILL 
DISPLAY ANY DISK FILE, A PARAGRAPH AT A TIME, AND ASK Y"U QUESTIONS 
ABOUT IT.' AUT[)OUEST REQUIRES THAT YOUR ANSWER MATCH WORD F R WORD 
WHAT IS IN THE TEXT YOU HAVE READ. EACH ANSWER YOU TYPE SHOULD BE 
FOLLOWED BY A PERIOD. (IF THE ANSWER ENDS IN A NUMBFR, F()LL()W IT 
WITH A SPACE AND THEN A PERIOD.) DON'-T USE PAREN-THESES IN ANYTHING 
YOU TYPE. WHEN YOU ARE FINISHED WITH YOUR STUDY, -TYPE G(X)DBYE. 

occasionally; AUTOQUEST will ask A QUESTION WAT D()ESN'T MAKE 
SENSE. IN THAT CASE, JUST TYPE «SKIP." AND IT WILL GIVE YOU ANOTHER 
QUESTION. 

GOOD LUCK! 



END OF FILE 
<W()LFE>INTR0.»7 

WHAT FILE WOULD YOU LIKE TO STUDY? <gsp>aini2l I 
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CS-383, AD769673, 
^Yorick WilkSt 

' '•'Natural Language Inference" 
24 pagesv September 1973. 



The paper describes the way in which a Preference Semantics system 
for natural language analysis and generation tackles a difficult 
class of anaphoric inference problems (finding th correct referent 
for an English pronoun in context) i those requiring either analytic 
(conceptual) knowledge of a complex sort, or requiring weak 
inductive knowledge of the course of events in the real world. The 
method employed converts all available knowledge to a canonical 
template form and endeavors to create chains of non-deductive 
inferences from the unknowns to the possible referents. Its method 
of selecting among possible chains of inferences i's cdhsistent with 
the overall principle of ^semantic preference^ used to set up the 
original meaning representation, of which these anaphoric inference 
procedures are a manipulation. 



ARE YOU READY T() ANSWER A QUESTION ON WHAT YOU HAVE JUST READ? sure 



- WHAT DOES THE METOOD EMPLOYED ENDEAVOR TC) D(i? 

create chains of non_.deductive inferences from the ^unknowns 

to the possible referents* / 

CORRECT 



AIM-212, CS-384, AD769379, 
Ahnette Herskovits, 

"Ihe Generation of French from a Seman^tic Representation* 
20 pagest September 1973. 
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The report contains first a brief description of Preference 
Semantics, a system of representation and analysis of the meaning 
structure of natural language. The analysis algorithm which 
transforms phrases into semantic items called templates has been 
considered in detail elsewhere, so this report concentrates on the 
second phase' of analysis, which binds templates together into a 
higher level semantic -block corresponding to an English paragraph, 
and which, in operation, interlocks with the French generatior\ 
procedure. During this phase, the semantic relations between 
templates are extracted, pronouns are referred and those word 
disambi?Mations are done . that , require the context of a whole 
paragraph.. These tasks , require items called <paraplates> which ate 
attached to iceywords such as prepositions, subjunctions and relative 
pronouns. The system chooses the representation which maximizes a 
carefully defined ^semantic density-'. ^ 



ARE YOU READY TO ANSWER A QUESTION ON WHAT YOU HAVE JUST READ? ok 



WHAT CHOOSES THE REPRESENTATION WHICH MAXIMIZES A CAREFULLY DEFINED ^ 

SEMANTIC DENSITY' ? 

the author of the report. 

WRONG— SYSTEM / TRY AGAIN 



The report contains first a brief description of Preference 
Semantics, a system of representation and analysis of the meaning 
structure of natural language. The analysis algorithm which, 
transforms phrases into semantic items called templates has been 
considered in detail elsewhere, so this report concentrates on the 
second phase of analysis,- which bindsO templates together into a 
higher level semantic block corresponding to an English paragraph, 
and which, in operation, interlocks with the French generation 
procedure. During this phase, the semantic relations between 
templates are extracted; pronouns are referred and those word 
disambiguations are done that require the context of a whole 
paragraph. These tasks require items called <paraplates> which are 
attached to keywords such as prepositions, subjunctions and relative 
pronouns. The system chooses the representation which maximizes a 
carefully defined ^semantic density-'. 



ARE YOU READY TO ANSWER A QUESTION ON WHAT YOU HA^E JUST READ? y 



Si. 
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WHAT CH(X)SES THE REPRESENTATION WHICH MAXIMIZES A CAREFULLY DEFINED ^ 
SEMANTIC DENSITY^ ? 
the system. 
CORRECT 



A system for the generation of French sentences is described, based 
on the generation of French sentences is described, based on the ^ 
recyrsive evaluation of procedural generation patterns called 
<stereotypes>. The stereotypes are semantically context sensitive, are 
attached to each sense of English words and keywords and are carried 
into the representation by^ the analysis procedure. The 
representation of the meaning of words, and the versatility of the 
stereotype format, allow for fine meaning distinctions to appear in 
the French, and for the construction of French differing i^adically 
from the English origin. 



AIM-213, CS-385,, 
Ravindra B. Thosar, 

"Recognition of Continuous Speecht Segmentation and Classification 
using Signature Table Adaptation" 
37 pages, September 1973. 
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This report explores the possibility of using a set of features for 
segmentation and recognition of continuous speech. The features are 
not necessarily <clistinctive> or minimal^ in the seose that they do 
not divide the phonemes into mutually exciusive subsets, and can 
have hi^ redundancy. . This concept of feature can thus avoid 
apriori binding between the .phoneme categDries to be recognized and 
the set of features defined in a particular system. 

ARfc YOU_ READY TO ANSWER A QUESTION ON WHAT YOU HAVE JUST READ? yes 



WHAT CAN WE FEATURES HAVE 
hiQh redundancy. 
CORRECT 



An adaptive technique is used to find the probability of the 
presence of a feature. Each feature is treated independently of 
other features. An unknown utterance is thus represented by a 
feature graph with associated probabilities. It is, hoped that such a 
representation would be valuable for a hypoihesize-test paradigm ' as 
opposed to a one which operates on a linear symbolic input. 

ARE YOU READY TO ANSWER A QUESTION ON WHAT YOU HAVE JUST READ? yes 



WHAT IS EACH FEATURE TREATED INDEPENDENTLY OF ? 

other features. 

CORRECT 
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AIM-214, CS-386, AD767332, 
Waiter, A. Perkins, Ihomas 0. Binford, 
-••A Corner Finder for Visual Feedback" 
59 pages,^ September 1973* 



i 

In visual-feedback work often a model of an object and its 
approximate location are known ^nd it ds only necessary to determine 
its locatipn anch-^rientation more accurately^ The purpose of the 
program described herein is to provide such information for the case 
-in which the model is an edge or corner. Given a model of a line or 
a corner with two or three edges, the program searches a 
of arbitrary size looking for one or all corners which 
model. A model-driven program directs the search. It 
another program to find all lines inside the window. Then 
at these lines and eliminates lines which cannot match 



TV window 
match the 
calls on 
it looks 
any of the 



model lines. It next calls on a program to form vertices and then 
checks for a matching vertex. If this simple procedure fails, the 
model-driver has two backup procedures. First it works with the 
lines that it has and tries to form a matching vertex (corner). If 
this fails, it matches parts of the model with vertices and lines 
that are present and then tak<>s a careful look in a small region in 
which it expects to find a missing line. The program often finds 
weak contrast edges in this manner. Lines are found by a global 
method after the entire window has been scanned with the Huecke.l 
edge operator. ^ 



ARE YOU READY TO ANSWER A QUESTION 'WHAT YOU HAVE JUST READ? sure. 



WHAT KIND OF - DRIVEN PROGRAM DIRECTS THE SEARCH 1 
a model^driven one. 
WRONG— MODEL / TRY AGAIN 
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In visual-feedback work often a model of an object and its 
approximate location are known and it is only necessary to determine 
its location and orientation more accurately. The purpose of the 
program described herein is to provide such information for the case 
in which the model is an edge or Qorner. Given a model of ^ l^^^ °l 
a corner with two or three edges, the proyam searches a T-V window 
of arbitrary size looking for one or all corners which match the 
model. A model-driven program directs the search. It calls on 
another prog-am to find all lines inside the window. Then it looks 
at these lines and eliminates lines which cannot match. anV o^^"® 
model lines. It next calls on a program to form vertices and then 
checks for a matching vertex. If this simple procedure rails, the 
model-driver has two ' backup procedures. First it works with the 
lines that it has and tries to form a matching vertex (corner). ir 
this fails, it matches parts qf the .model with vertices and lines 
that are present and then takes a careful look in a sm^U region in 
which it expects to find a missing line; The program often finds 
, weak contrast edges in this manner. Lines are found by a global 
method after the entire window has been scanned with the HuecKei 
edge oper^itor. 



AHE YOU READY TO ANSWER A QUESTION ON WHAT YOU HAVE JUST READ? yes 



WHAT KIND OF - DRIVEN PROGRAM DIRECTS THE SEARCH ? 

a model driven one. ^ 

CORRECT 
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CS-387, AD769380, 
Bruce G* Buchanan, S^^Sridharan, 

"Analysis of Bahavior of^hemical Molecule^i Rule Formation on 
Non-homogeneous Classes of Objects^* 
15 pages, September 1973. 



An information processing model of some important aspects of 
inductive reasoning is presented within the context of one 
scientific discipline* Given a collection of ^ experimental (mass 
spectrometry) data from several chemical molecules the computer 
program described here separates the molecule^ into, j<well-behaved> 
subclasses and selects from the space of all eicplanatory processes 
the <characteristic> processes for each subclass. The definitions of 
<well-behaved> and <charact^r istio embody several heuristics which 
are discussed. Some results of the program ar/le discussed which have 
been useful to chemists and which lend credibility to this approach. 



ARE YOU READY TO ANSWER A QUESTION ON WHAT YOU HAVE JUST READ? y 



WHAT IS AN INFORMATION PROCESSING MODEL OF SOME IMPORTANT ASPECTS OF 
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A systematic method of identification of aril possible graph isomers 
consistent with a given empirical formula is described. The method, 
embodied in a computer program, generates a complete list of isomers. 
Duplicate structures are avoided prospectively. ^ 
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A systematic method of identification of all possible g|aph i-somers 
consistent with a given empirical formula is described. (The method, 
embodied in a computer program, generates a complete list .of isomers. 
Duplicate structures -are avoided prospectively. 
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WHAT is DESCRIBED ? 

a systematic method of identification of all possible graph 

isomers for the same empirical formula. 
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A computer program has been written that successfully discovers 
syntheses for complex .organic chemical molecules. Ihe definition of 
the search space and strategies for heuristic search are described in 
this paper* 
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